Interpreting Microarray Expression Data Using Text Annotating the Genes

نویسندگان

  • Michael Molla
  • Peter Andreae
  • Jeremy D. Glasner
  • Frederick R. Blattner
  • Jude W. Shavlik
چکیده

12 Microarray expression data is being generated by the gigabyte all over the world with 13 undoubted exponential increases to come. Annotated genomic data is also rapidly 14 pouring into public databases. Our goal is to develop automated ways of combining 15 these two sources of information to produce insight into the operation of cells under 16 various conditions. Our approach is to use machine-learning techniques to identify 17 characteristics of genes that are up-regulated or down-regulated in a particular micro18 array experiment. We seek models that are (a) accurate, (b) easy to interpret, and (c) 19 stable to small variations in the training data. This paper explores the effectiveness of 20 two standard machine-learning algorithms for this task: Na€ıve Bayes (based on prob21 ability) and PFOIL (based on building rules). Although we do not anticipate using our 22 learned models to predict expression levels of genes, we cast the task in a predictive 23 framework, and evaluate the quality of the models in terms of their predictive power on 24 genes held out from the training. The paper reports on experiments using actual E. coli 25 microarray data, discussing the strengths and weaknesses of the two algorithms and 26 demonstrating the trade-offs between accuracy, comprehensibility, and stability. 27 2002 Published by Elsevier Science Inc. 28 Information Sciences xxx (2002) xxx–xxx www.elsevier.com/locate/ins Corresponding author. E-mail addresses: [email protected] (M. Molla), [email protected] (P. Andreae), jeremy@ genome.wisc.edu (J.Glasner), [email protected] (F.Blattner), [email protected] (J. Shavlik). 1 On leave from Victoria University of Wellington, NZ. 0020-0255/02/$ see front matter 2002 Published by Elsevier Science Inc. PII: S0020 -0255 (02 )00216 -5 INS 6632 No. of Pages 14, DTD=4.3.1 12 August 2002 Type SPS, Chennai ARTICLE IN PRESS

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gene Identification from Microarray Data for Diagnosis of Acute Myeloid and Lymphoblastic Leukemia Using a Sparse Gene Selection Method

Background: Microarray experiments can simultaneously determine the expression of thousands of genes. Identification of potential genes from microarray data for diagnosis of cancer is important. This study aimed to identify genes for the diagnosis of acute myeloid and lymphoblastic leukemia using a sparse feature selection method. Materials and Methods: In this descriptive study, the expressio...

متن کامل

Diagnosis of Breast Cancer Subtypes using the Selection of Effective Genes from Microarray Data

Introduction: Early diagnosis of breast cancer and the identification of effective genes are important issues in the treatment and survival of the patients. Gene expression data obtained using DNA microarray in combination with machine learning algorithms can provide new and intelligent methods for diagnosis of breast cancer. Methods: Data on the expression of 9216 genes from 84 patients across...

متن کامل

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...

متن کامل

Classification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest

Background & objective: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and smal...

متن کامل

Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine

We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...

متن کامل

Using the Protein-protein Interaction Network to Identifying the Biomarkers in Evolution of the Oocyte

Background Oocyte maturity includes nuclear and cytoplasmic maturity, both of which are important for embryo fertilization. The development of oocyte is not limited to the period of follicular growth, and starts from the embryonic period and continues throughout life. In this study, for the purpose of evaluating the effect of the FSH hormone on the expression of genes, GEO access codes for this...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Inf. Sci.

دوره 146  شماره 

صفحات  -

تاریخ انتشار 2002